AutoBayes 

Program Synthesis System 
System Internals 


Johann Schumann, SGT, Inc. 


NASA Ames Research Center 
Version: Nov, 2011 



Preface 


This document is a draft describing many important concepts and details of Au- 
TOBAYES, which should be helpful in understanding the internals of AutoBayes 
and for extending the AutoBayes system. Details on installing AutoBayes, us- 
ing AutoBayes, and many example specifications can be found in the AutoBayes 
manuafl 

This version of the document contains the supplemental information for the lecture 
on schema-based synthesis and AutoBayes, presented at the 2011 Summerschool on 
Program Synthesis (Dagstuhl, 2011). 

This lecture combines the theoretical background of schema based program synthesis 
with the hands-on study of a powerful, open-source program synthesis system (Auto- 
Bayes). 

Schema-based program synthesis is a popular approach toward program synthesis. 
The lecture will provide an introduction into this topic and discuss how this technology 
can be used to generate customized algorithms. 

The synthesis of advanced numerical algorithms requires the availability of a power- 
ful symbolic (algebra) system. Its task is to symbolically solve equations, simplify 
expressions, or to symbolically calculate derivatives (among others) such that the 
synthesized algorithms become as efficient as possible. We will discuss the use and 
importance of the symbolic system for synthesis. 

Any synthesis system is a large and complex piece of code. In this lecture, we will 
study Autobayes in detail. AutoBayes has been developed at NASA Ames and has 
been made open source. It takes a compact statistical specification and generates 
a customized data analysis algorithm (in C/C++) from it. AutoBayes is written 
in SWI Prolog and many concepts from rewriting, logic, functional, and symbolic 
programming. We will discuss the system architecture, the schema libary and the 
extensive support infra-structure. 

Practical hands-on experiments and exercises will enable the student to get insight 
into a realistic program synthesis system and provides knowledge to use, modify, and 
extend Autobayes. 

Tttp : //ntrs .nasa.gov/ archive/nasa/ casi . ntrs . nasa. gov/20080042409_2008042209 .pdf 
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1. Starting AutoBayes 


1.1 Command-line 

Usually AutoBayes is called using the command line, e.g., 
autobayes -target raatlab mix-gaussians . ab 

where mix-gaussians . ab is the AutoBayes specification. Command line options 
and pragmas start with a“-”. 

Note, that the current version under Windows requires the following command line 
(when starting from a DOS prompt): 

autobayes — -x autobayes — -target matlab mix-gaussian. ab 

1.2 Interactive Mode 


Starting AutoBayes into the Prolog interactive mode can be done by 


1 

2 

bash —3.2$ 

../autobayes —interactive 
1 

mog . ab 


1 

3 

4 


1 

AutoBayes VO. 9. 9 

Sat Jul 2 

09:42 

1 

: 2 6 2011 | 

5 


Copyright (c) 1999 — 2011 United States 

Government 

6 


as represented by the 

Administrator of 

the 

National 

7 


| Aeronautics and Space 

Administration . 


1 

8 


All Rights Reserved . 

1 

Distributed under 

NOSA 

1.3 | 

1 

9 

10 


1 



1 

11 


*** Interacti 

ve shell started 

* * * 


12 






13 

?— load ( ’mog. ab ’ ) . 




14 

Success 

mog . ab ] : no errors found 




15 

true . 





16 






17 

?— solve . 





18 

... « all 

logging messages » 





Listing 1.1: Starting AutoBayes into interactive mode 
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Starting AutoBayes 


There is a number of commands available in the interactive mode of AutoBayes. 
These are defined in interface/commands .pi. 

load(+File) loads specification file and constructs the AutoBayes model, 
clear deletes the current AutoBayes model 
show lists the current model. 

save(+File) saves the current model in the AutoBayes specification syntax into a 
named file (unsupported). 

solve attempts to solve the model and generate intermediate code, and list it on the 
screen. 

This command also should place the generated code into the Prolog data base. 

1.3 Loading AutoBayes into Prolog 

i $pi 

2 % library ( swi_hooks ) compiled into pce_swi_hooks 0.00 sec, 2,284 bytes 

3 Welcome to SWI— Prolog (Multi— threaded , 32 bits, Version 5.10.2) 

4 Copyright (c) 1990—2010 University of Amsterdam, VU Amsterdam 

5 SWI— Prolog comes with ABSOLUTELY NO WARRANTY. This is free software , 

6 and you are welcome to redistribute it under certain conditions. 

7 Please visit http://www.swi-prolog.org for details. 

8 

9 For help, use ?— help (Topic), or ?— apropos (Word) . 

10 

11 ?— [ main_autobayes ] . 

12 «lots of messages» 
is ?- 

Listing 1.2: Loading AutoBayes into Prolog 


Note that here only the AutoBayes program code will be loaded but not any spec- 
ification or command line flags/pragmas. 




2. AutoBayes Architecture 


2.1 Top-level Architecture 





^3 

+ 

+ 

B 


Figure 2.1: AutoBayes architecture 


2.2 Directory Structure 

The AutoBayes directory structure is shown in Figure [272] Note that a global shell 
variable, AUTOBAYESHOME must point to the top-level directory. 

2.3 Synthesis and Code Generation 

The synthesis and code generation parts are strictly separated. The synthesis kernel 
generates one or more customized algorithms and places them (using assert) into the 
PROLOG data base under the predicate name synth.code (Stage, Code). After the 
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synthesis phase, the generated algorithms are retrieved one-by-one and code is gener- 
ated for them. This is done by the predicate main_cg_loop (hie: toplevel/main.pl). 

Note that the synthesis component does not use any information about the code 
generation target. 

Each subcomponent of AutoBayes can run individually. AutoBayes can dump 
the generated algorithm (-dump synt) into a hie, which then can be read in by the 
AutoBayes codegenerator main_codegen(DumpFile) . This is accomplished with the 
command-line switch -codegen. 

2.3.1 Synthesis 

After opening and preprocessing the AutoBayes specification hie (using the CPP 
preprocessor), the specih cation is read in using the prolog parser. Predicates for 
handling the specification are in interface/ syntax. pi. All information is stored in 
the Prolog data base as the AutoBayes model. The goal statement 

max pr(...) for VAFLSET 

actually triggers the program synthesis, ft puts the information into the model as 
optimizertarget ( . . .). 

After reading the specification and processing the command line, the predicate 
main.synth (+Specf ile) 
triggers the synthesis: 

• the specification hies is preprocessed 

• all log-hles are opened 

• depending on the number of requested programs, the predicate main_synth_loop 
is calles, which calls the schema-based process synth_arch/3. If more than one 
program is requested, this predicate is visited again using backtracking. The 
program, which is generated during each call of that predicate is stored in the 
Prolog data base (non-backtrackable). 

• After all requested programs have been generated, the synthesis part is finished. 
The actual generation of code is done using the predicate main_cg_loop (see 
below). 



2.3 Synthesis and Code Generation 
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2.3.2 Code Generation 

The code generator is parameterized by the target flag -target, which selects the 
code generation target as well as a number of pragmas. 

The code generation is performed in several stages; stages, which are language-specific 
(e.g., C, C++, Ada) are marked with “L”, those, which are target-specific with “T”. 

1. top-level: main_cg_loop 

2. for each generated program in synth_code (_, ) perform the proper code gen- 
eration 

3. main_cg_prog performs: 

get name of generated program 

get and simplify complexity bound (if applicable) 

add declarations for the variables in the for-loops loopvars 

optimize the pseudo-code (pseudo_optimize) 

check for syntactic correctness of the intermediate code pseudo_check 

list the code after optimization main_list_code ( ’ iopt ’ , . . . 

generate the actual code cg_codegen(Code) 

The predicates for the actual code generation is in the directory codegen and sub- 
directories thereof, ft’s top-level predicate cg_codegen(Code) performs the following 
steps 

1. open the symbol table 

2. add (external) declarations 

3. preprocess the code cg_preprocess_code 

get target language and target system 
preprocess the pseudo code cg_preprocess_ps (L,T) 

transform the code into language/target specific constructs cg.transf orm_code 
(L,T) 

4. produce the code cg_produce_code 

open all files 
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generate headers cg_generate_header 

generate include statements cg_generate_includes 

generate declarations of global variables cg_generate_globals 

produce code for each component (or for the main procedure). This is done 
using cg_produce_component, which then executes cg.preamble, cg_generate_code, 
and cgypo st amble 

produce end of HTML headers (why?) 
close the hies 

5. list the code in various formats 

6. produce the design document if desired 

The cg.preamble is just a switchboard, which causes the generation of the interface 
code for the given procedure, the usage statement, and the input/output declarations. 
Similarly, the cg.postamble produces code at the end of the given procedure (e.g., 
handling of return values). 

The switchboard for the code generation cg_generate_code is in the hie codegen/ cg.code . pi 
and finally calls cg_generate_lowlevel_code, which is specihc for each target system 
and prints each statement one after the other. 

2.3.3 Target Specific Code Generation 


Lang 

Target 

cmdline-flags 

C [c_c++] 

C [c_c++] 

Ch — b [c_cH — b] 
ADA 

Mat lab 
stand-alone 
Octave 
stand-alone 

-target matlab 
-target standalone 
-target octave 
-target a da 


Table 2.1: Code generator options 


Note that the ADA version is not fully supported. 




2.3 Synthesis and Code Generation 
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Figure 2.2: The directory structure of AutoBayes 


3. The Schema System 


The schema-based synthesis process is triggered by the goal expression in the Auto- 
Bayes specification. There are two different kinds of goal expressions 


i double mu. 



2 data double x 

(1.. 

10) . 

3 max pr { x | mu} 

for 

{mu} . 


Listing 3.1: AutoBayes specification for a probabilistic optimization problem 


1 double x . 

2 max — x**2 + 5*x —7 for {x}. 

Listing 3.2: AutoBayes specification for a functional optimization problem 

Whereas the first form performs a probabilistic maximization (and triggers calls to 
synth_schema, the second form is a functional optimization and triggers synth_f ormulartry. 

Note that all sorts of probability expressions in the goal are automatically converted 
into a log_prob( . . . ) expression. 

3.1 The synth schema Predicate 

The top-level schema predicate is 

synth_schema(+Goal , +Given, +Problem, -Program) 

In most cases, a 5-ary predicate is used to solve log_prob(U,V) problems: 
synth_schema(+Theta, +Expected, +U, +V, -Program) 

Note that the AutoBayes model has to be considered as an additional “invisible” 
argument. For efficiency reasons, AutoBayes implements the model using global 
(backtrackable) data structures. Otherwise, the model would need to be carried as 
additional arguments, like 

synth.schemal (+Theta, +Expected, +U, +V, +ModelIn, -ModelOut, -Program) 

That modification would not only be required for the top-level schemas, but also for 
all support predicates, making this approach cumbersome. 





3.2 The synth_formula Predicate 
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3.2 The synth_formula Predicate 

This predicate is used to solve functional optimization problems. These schemas 
can be either called from the top-level or by other schemas in case some functional 
subproblem must be solved. The top-level predicate is: 

synth_f ormula(+Vars , +Formula, +Constraints , -Program) 

This predicate tries generate a program (or solve symbolically) that finds the optimum 
(maximum) values of the variables Vars in the formula Formula under the given 
constraints. 


Note: synth_f ormula.try is a guarded front-end to synth_f ormula that handles stack 
and tracing. In particular, in case of failure, the dependencies must be restored using 
depends_restore. 


Figure 3.1 shows the entire (static) schema hierarchy. Note, that during synthesis, 
one schema can trigger arbitrary other schemas in order to solve a given problem. 


3.3 AutoBayes Schema Hierarchy 

AutoBayes has a separate schema hierarchy for probabilistic and functional prob- 
lems. 

All schemas are in Prolog hies, which are included in the hie synth/synth. pi. Note 
that the order is important, as the schema-search uses Prolog’s backtracking search. 

1 discontiguous synth_schema / 5 . 

2 multifile synth_schema / 5 . 

3 discontiguous synth_schema /4. 

4 multifile synth_schema /4. 

5 

6 discontiguous synth_formula /4. 

7 multifile synth_formula /4. 

8 dynamic synth_formula /4. 

9 ... 

10 synth_schema ( [ ] , , , skip) : — 

11 ! . 

12 

13 [ ’ schemas/preprocessing/scaling . pi ’ ] . 

14 

15 synth_schema ( Goal , Given, log_prob (U,V) , Program) ... 

16 

17 [ ’schemas /decomp/ d_prob . pi ’] . 


18 
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Listing 3.3: AutoBayes Schema hierarchy — inclusion mechanism 


3.3.1 AutoBayes probabilistic Schema Hierarchy 

1 synth.schema ( [] , , , skip) !. 

2 

3 [’ schemas/ preprocessing / scaling . pi ’] . 

4 synth_schema ( Goal , Given, log.prob (U,V) , Program) ... 

5 

6 [ ’ schemas /decomp/ cLprob . pi ’ ] . 

7 [’ schemas/ sequential /kalman . pi ’] . 

8 [’ schemas/ sequential/sequent . pi ’] . 

9 [’ schemas/ clustering /rndproject . pi ’] . 

10 [’ schemas/ clustering /em. pi ’] . 

11 [ ’ schemas/clustering /kmeans . pi ’ ] . 

12 synth_schema (Theta , Expected, U, V, Program) : — 

13 % CONVERT PROBLEM TO PROBLEM OVER FORMULA FOR SYNTH_FORMULA/4 

Listing 3.4: AutoBayes Schema hierarchy — probabilistic schemas 


3.3.2 AutoBayes functional Schema Hierarchy 

1 [’ schemas /decomp/ d_formula . pi ’] . 

2 [ ’ schemas/ symbolic/ lagrange . pi ’ ] . 

3 [’ schemas/ symbolic/ solve . pi ’] . 

4 [’ schemas/ g s 1 / gsl— maximization . pi ’] . 

5 [’ schemas/numeric/ section . pi ’] . 

6 [’ schemas/numeric/ simplex . pi ’] . 

7 [’ schemas/numeric/ generic . pi ’] . 

Listing 3.5: AutoBayes schema hierarchy — functional schemas 


3.3.3 AutoBayes Support Schema Hierarchy 

Several schemas call special-purpose sub-schemas, e.g., to produce code for initializa- 
tion. These predicates have non-standarized arguments and form individual hierar- 
chies. An example are the schemas for producing initialization code for the clustering 
algorithms in synth/schemas/clustering/clusterinit .pi with the main schema 
predicate 

ci_center_select (+CenterName , +Dataln, +IPointsIn, +IClassesIn, +CDim, -Program) 





3.4 Adding a New Schema 
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3.4 Adding a New Schema 

3.4.1 Example 1 

This example is an existing schema in AutoBayes, which, given a log.prob problem, 
tries to solve it symbolically or as a numerical optimization problem. This schema can 
be found in synth/ synth . pi and has been abbreviated. In particular, all generation 
of explanations have been removed for clarity. 

1 synth_schema ( Theta , Expected, U, V, Program) : — 

2 % Decompose as far as possible 

3 cpt.theorem (U, V, Prob , rels (Theta) ) , 

4 

5 % Check whether Prob is atomic and if so, replace it by the 

density 

6 prob.replace (Prob , Prob.formula) , 

7 

8 % EXTRACT MODEL CONSTRAINTS 

9 model_constraint ( Pre .constraint ) , 

10 simplify ( Pre.constraint , Constraint), 
n 

12 copy .term ( Expected , Expectecl.copy ) , 

13 synth.sum.expectecl ( Expected_copy , log ( Prob.formula ) , Pre.formula ) , 

14 p v _li ft _exi s t e n t i al ( Pre .formula ) , 

is simplify ( Constraint , Pre.formula , Formula) , 

1 6 assert .trace ( trace.schema.inout , ’ synth/ synth . pi ’ , 

17 [ ’Log— Likelihood ^function : \n ’ , Formula]), 

18 

19 % Build the dependency graph from the simplified formula, 

20 % RECURSE ON THE FORMULA AND CLEAN UP . 

21 depends.save , 

22 depends.clear , 

23 depends.build.from.term (Formula) , 

24 

25 % Find closed— form solution 

26 synth.formula.try ( Theta , Formula, Constraint, Step), 

27 

28 % Clear stack 

29 clepends.restore , 

30 

31 % COMPOSE THE PROGRAM 

32 Program = series ([Step] , 

33 [comment( ’ lots ^of .text ... ’ ) , f .loglikelihood (Formula) ] ) . 


Listing 3.6: Example Schema 
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The schema in Listing 3.6 is called with the statistical variables Theta and the ex- 
pected variables Expected, as well as U, V, which are the arguments of the logrprob 
problem. 


The first two subgoals decompose the problem statistically (using the AutoBayes 
model) and, if successful return the probability Prob to be solved. Then it is checked if 
this probability is atomic, i.e. , it is not conditional. The resulting formula Prob_f orraula 
must be considered. This predicate also replaces all PDFs (e.g., gauss) by the cor- 
responding symbolic formulas (see Chapter [I])). These two predicates comprise the 
guards for this schema. In order to obtain the (numerical) formula that is to be 
optimized, the following steps must be carried out. 


In parenthesis are the values for the normal-example. 

The predicate is called with synth_schema( [mu, sigma], [] , [x(_)] , [mu, sigma]. 
Program) . 

The probability formula is 


— l+n 

n p 

i = 0 


With the PDF replaced, the problem to solve (Pre_f ormula) becomes 


— l+n 

log n exp 

i=0 




V 


Lx 





where the constraints, coming from the model are 

and( [and( [not (0=n_points) , 0=<n_points] ) , 
and( [not (0=sigma_sq) ,0=<sigma_sq] ) , 

type (mu, double) ,type (n_points ,nat) , type (sigma_sq, double) , type (x, double)] ) 

• because the log-likelihood is maximized, a logarithm of the probabilistic formula 
must be taken. 

• This formula is the transformed into a sum with respect to the Expected values. 

• This sum is simplified under the given constraints 

• the schema-driven solution of the problem is tried synth_f ormula_try and a 
code segment is returned in Step. 


3.5 Notes 
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• The final program segment is a code block containing that code segment 

• After processing, dependencies must be restored. 

3.4.2 Example 2 

3.5 Notes 

• scaling: must extract sigma 

• loop around EM: flag controlled or statistics controlled 

• numerical optimization: regula falsi 

• multivariate optimization full synthesis, based upon gsl utilities 

3.6 Schema Control 

Prolog backtracking search 

multiple programs (-maxprog) 

multiple programs with complexity (unsupported) 

control via pragmas 

schema-control language 
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Figure 3.1: The static schema- hierarchy for AutoBayes 


4. Probability Density Functions 


Probability Density Functions (PDFs) and their properties can be defined easily. In 
order to add a new PDF, e.g., mypdf, two places must be modified: (1) the symbol 
must be made a special symbol for the input parser (file interf ace/symbols .pi) and 
(2) define the properties in the file synth/distribution.pl. 


As an example, the PDF mypdf should have the same properties as the regular 
Gaussian, i.e., be defined for one variable and should have 2 parameters, e.g., X ~ 
mypdf (a, b). 



For each PDF, its density with respect to the parameters must be given, the mean, 
the mode, and the variance. Specific constraints for each PDF can be given. However, 
the current version of AutoBayes does not use these constraints. 
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4.1 The AutoBayes Model 

The statistical model, as given by the specification is stored in a global data structure, 
the model. The predicates concerning handling of the model are mainly in the file 
synth/model .pi. 

4.1.1 The Model Data Structure 

The current contents of the entire model can be printed or written into a stream using 
the predicate model_display. The predicate names (e.g., model_name) are those that 
are stored in the prolog data base in a backtrackable manner using bassert and 
bretract. 

1 ?— model-display . 

2 

3 Model : mog 

4 

5 Vers . : 0 

6 

7 ■: : : v :■■■= NAMES 

s model_name (x) 

9 model_name ( c ) 

10 ... 

11 TYPES I 

12 model_type (x , double ) 

13 model-type ( c , nat ) 

14 model-type ( sigma , double ) 

15 ... 

16 %■"■: ■ ■: : CONSTANTS ! 

17 model_constant ( n -classes ) 
is model-constant ( n-points ) 

19 %■ ■ ■ ■ : ■■ ■ ■ ■ •• •• • ■ ■■ ■ ' :■■ == Outputs : 

20 model_output ( c ) 

21 %■■ = VARIABLES: 

22 model_var(x) 

23 model_var(c) 

24 model_var ( sigma ) 

25 model-var (mu) 

26 model_var ( phi ) 

27 % :■■■:■■ : : RANDOM: 

28 model_random (x) 

29 model-random ( c ) 

30 ... 

31 %-■ INDEXED : 

32 model-indexed (x , [ dim (0 ,+ [ — l,n -points]) ]) 

33 model -indexed ( c , [ dim (0 ,+ [ — l,n -points]) ]) 
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35 %- ■ ■ ■ : v - : -t- DISTRIBUTIONS I 

36 var .distributed (x(A) , gauss , [mu( c (A) ) , sigma ( c (A) ) ] ) 

37 var_distributed(c (A) , discrete ,[vector((B:=0.. +[ — l,n_classes]) , phi (B) ) 

]) 

38 %-■ - KNOWNS : 

39 var .known (x(A) ) 

40 %- ■ ■ :::::: = CONSTRAINTS ! 

41 var.constraint (sigma(A) ,and ( [ not(0=sigma (A) ) ,0=<sigma(A) ] ) ) 

42 var.constraint (phi (A) ,0= +[ — l,sum ([idx(B,0, + [ — l,n_classes]) ] , phi (B) ) ] ) 

43 var.constraint (n.points , n.classes «n .points ) 

44 ... 

45 OPTIMIZE: 

46 optimize.target ( [mu(A) ,phi (B) ,sigma(C) ] , [] , log.prob ( [x(D) ] , [mu(E) ,phi ( 

F) , sigma (G) ] ) ) 

Listing 4.3: Displaying the AutoBayes model 


4.1.2 The Model Stack 


During the synthesis process, schemas can modify the model. Since the schema-based 
synthesis process is done using a search with backtracking, changes to the model must 
be un-done in case a schema fails. 

Therefore, AutoBayes uses a backtrackable data structure for the model and a model 
stack. Before a schema or subschema modifies a model, it usually generates a copy of 
the model (model .save) on the stack. That copy then can be modified, destroyed, or 
the old model restored with a pop on the stack model_restore). 

The individual predicates are: 


model_clear/0 remove any modifiable model parts 
model_destroy/0 completely remove a model from the database 
model_save/0 save modifiable model parts at next level 
model_restore/0 restore modifiable model parts to previous level 


Figure [ lT shows the relation between the calling tree of the schemas (with the current 
schema shaded) and the model stack. Note that not all schemas modify the model. 
For these schemas, no new copy of the model needs to be made. 


4.1.3 Modifying the Model 

The model can be modified using predicates in synth/model . pi. E.g., model_makeknown(X) 
makes the statistical variable X known in the model. 
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Figure 4.1: Relation ship between dynamic calling tree of schemas (left) and model 
stack (right). 



5. Low-level Components of AutoBayes 


5.1 Command Line Options and Pragmas 

AutoBayes is called from the command line with command line options and pragmas. 
Command- line options (starting with a control the major operations of Auto- 
Bayes. Pragmas are a flexible mechanism for various purposes, like setting specific 
output options, controlling individual schemas, or for debugging and experimentation. 

5.1.1 Pragmas 

In AutoBayes all pragmas are implemented as Prolog flags. The command-line 
interpreter analyzes all tokens starting with -pragma and sets the flag accordingly. 

Pragmas can be set inside an AutoBayes specification using the flag directive, e.g., 

flag ( schema_control_init .values , _ , automatic). 

There, no check of validity of the flag’s name or its value is performed. 

Adding a new Pragma 

All pragmas are defined in the hie startup/flags .pi. Pragmas are declared by a 
pragma/6 multifile predicate: 

pragma (SYSTEM, NAME, TYPE, INIT, VL, DESC) . 

where 

SYSTEM = _ | ’AutoBayes’ j ’AutoFilter’ 

NAME = name of pragma = name of hag 

TYPE = boolean | integer | atomic | callable ... 

INIT = initial value 

VL = [ [V,E], ... ] possible values and explanations 

DESC = atom containing description 

1 pragma (’ AutoBayes ’ , schema_control_init_values , atomic, automatic, 

2 
3 


4 


[ automatic , ’ calculate Mest ^values ’ ] 
arbitrary , ’ use u ar bit rary u values ’] , 
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5 [ user , ’ user ^provides ^values additional mnput ^parameters ’ 

] 

6 ] > 

7 ’initialization^oCgoaCvariables’). 

8 

9 pragma ( ’AutoBayes ’ , schema_control_solve_partial , boolean, true, [] , 

10 ’enable^partiaCsymbolicmsolutions ’). 

11 

12 pragma (’ AutoBayes ’ , example_pragma , integer, 99 , [] , 

13 ’ExampleMor „an„integer ^pragma ’ ) . 

Listing 5.1: AutoBayes pragmas 


5.2 Backtrackable Global Data 

The schema-based synthesis process of AutoBayes uses PROLOG’S backtracking 
mechanism. In particular, the statistical model is modified during the search process 
by schemas. These changes must be undone during backtracking. 

Since the AutoBayes model is kept as a global data structure in the Prolog data 
base, mechanisms for backtrackable global data structures, namely flags and counters 
had to be developed. 

These predicates have been implemented in C as external predicates. 

NOTE: More receent versions of SWI Prolog might have similar mechanisms already 
incorporated. 

5.2.1 Backtrackable Flags 

Backtrackable flags are indexed by an natural number between 0 and N, where N is 
fized during compile time (systera/SWI/bf lag . c). 

The predicate pl_bflag(+N, -VI, +V2) gets the current value of backtrackable flag 
number N in VI and sets a new value in V2. Getting and setting values are done in 
the same way as for the standard Prolog flag/3. 

5.2.2 Backtrackable Counters 

Similar to AutoBayes counters util/counter .pi, backtrackable counters are de- 
fined by the following predicates 

1 bcntr_new(C) % new COUNTER 

2 bcntr _set (C,M) % set the COUNTER 

3 % THE PREDICATES BELOW ARE BACKTRACKABLE 
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4 bcntr.get (C,N) : — % GET THE CURRENT COUNTER VALUE 

5 bcntr_inc(C) : — % INCREMENT THE COUNTER 

6 bcntr _inc (C, Incr ) : — % INCREMENT COUNTER BY Incr 

7 bcntr_dec(C) % decrement the COUNTER 

Listing 5.2: Interface predicates for backtrackable counters 


5.2.3 Backtrackable Bitsets 


Backtrackable bitsets have been implemented as external predicates in C to enable 
backtrackable asserts and retracts. The extension defines one global backtrackable bit 
set for integers 1 . . BSET_DEFAULT_LENGTH and two interface predicates: inbset (X) 
succeeds if number X is in the bit set, setbset(X,l) adds number X to the bit set, 
and setbset(X,0) removes number X from the bit set. The latter two predicates are 
backtrackable. Note that bitsets are used only for the implementation of backtrackable 
asserts/retracts (see Section 5.2.4). 


5.2.4 Backtrackable Asserts/Retracts 

This module contains the Prolog-support for backtrackable asserts and retracts, i.e. , 
an assert /retract-mechanism which is integrated with the normal backtracking mech- 
anism of Prolog. An N-ary predicate F is declared as a backtrable predicate via 

:- backtrackble p/1. 


in a similar way to a dynamic-declaration. Backtrable asserts and retracts are done 
via bassert and beretract. Here, ’’backtrackable” means that the assertions are undone 
on backtracking by the Prolog-engine the same way variable bindings are undone, e.g.: 


i q (X) 



2 

. . . , 


3 

bassert (p (a) ) , 

Wo% WILL BE UNDONE/ RETRACTED ON BACKTRACKING 

4 

. . . , 


5 

fail , 


6 

1 1 1 J 


r q (X) 



8 

* * • 7 


9 

P(a) , 

Wo% FAILS 

10 

. . . , 



Listing 5.3: Backtrackable assertions 
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5.3 Data Structures and Their Predicates 

see files and their documentation in util 

bag.pl Predicates for a Prolog representation of bags 

diffset.pl Predicates for a compact representation of differences between arbitary 
term sets (see termset.pl) 

equiv.pl calculates the equivalence class of a binary relation that is given as a list of 
lists 

listutils.pl Predicates for handling of lists 

meta.pl Meta-operations on uninterpreted Prolog terms (e.g., unification, etc.) 

stack.pl Prolog representation of a stack 

subsumes.pl subsumption check 

term.pl Prolog representation for AC terms 

termset.pl Prolog representation for sets of terminstances. 

topsort.pl topological sort 

trans.pl calculates the transitive closure of a relation 

5.4 The Rewriting Engine 

A rewriting engine has been implemented on top of Prolog. Rewriting rules are given 
a Prolog clauses, which are being compiled for efficieny reasons. 

5.4.1 Rewriting Rules 

The rules for rewriting must be given as a predicate of the form 
rule(+Name, +Strategy, +Prover, +Assuraptions , +Termln, ?TerraOut) 

where the parameters have the following meaning: 

Name string or atom used to identify the rewrite rule (e.g., in tracing); should be 
unique. 

Strategy a strategy vector of the form 

[eval=Evaluation, f latten=Bool , order=Bool, cont=Continuation] 
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associated with each rule. Evaluation must be either eager or lazy; Continua- 
tion is either a Bool or a rule name. Rules with strategy [eval=eager | ] are 
applied a first time in a top-down fashion (i.e. , before the subtrees are normal- 
ized). Rules with strategy [eval=lazy | _] are applied in a bottom-up fashion. 

If the continuation-argument of rwr_cond is fail, pure bottom-up rewriting is 
implemented, otherwise dovetailing is implemented (i.e., exhaustive rewriting). 
Use the strategy vector [eval=lazy | _] as default for all rules 

Use the strategy vector [eval=lazy | ] as default for all rules to get the complete 
innermost /outermost strategy. Use a rule 

rule( ’block-f ’ , [eval=eager,_,_,cont=fail | _] , f (X) , f (X) ). to pre- 

vent rewriting from all subtrees with root symbol f. 

Prover currently not used 

Assumptions Use the assumption ’true’ for unconditional rewriting 
Termln Term to be normalized. 


TermOut Result of rule application. 


Simple rewriting rules are just unit clauses or complex rules with bodies (Listing |5.4[ ). 

1 expr_optimize ( _ , ’ expr— reintroduce — rec iproc al ’ , 

2 [ e val=lazy | _ ] , 

3 _ , _ , 

4 Term ** ( — 1) , 

5 1 / Term 

e ) : — 


8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


expr -Optimize ( Level , ’ expr— reintroduce —subtraction ’ , 

[ eval=lazy | _ ] , 

+ (Summands) , 

Subtraction 

) 

Level > 0, 

1 i s t _s p 1 i t _w i t h ( fact ors .negate , Summands, Neg , Pos) , 

Neg \== [] , 

i 

• 1 

(Pos cases 

— > expr.mk.subtraction (Neg , Subtraction), 

[P] — > expr_mk_subtraction (P , Neg, Subtraction), 

— > expr_mk_subtraction ( + (Pos) , Neg, Subtraction) 


)• 


24 
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Listing 5.4: Examples for Rewriting Rules 


5.4.2 Compilation of Rewriting Rules 


A (customized) set of rewriting rules is compiled into a ruleset using the directive 
rwr_compile. Note that the individual groups of rewriting rules can be placed in 
separate hies. 


1 

ruleA ( ’ rule A : 1 ’ , 

[ eval=eager | _] , _ , 

2 

Source , 

Target ) ! . 

3 

ruleA ( ’ ruleA : 2 ’ , 

[ eval=eager | _] , _ , _ , 

4 

Source , 

Target2 ) ! . 

5 

6 

ruleB ( ’ ruleB : 1 ’ , 

[ eval=lazy | _] , _ , _ , 

7 

8 

Source , 

Target ) ! . 

9 

10 



11 

rwr_compile ( myruleset , 

12 

[ 


13 

rulesA , 


14 

rulesB , 


15 



16 

i) ■ 


17 



18 

clo_rewrite(S, T) 


19 

rwr.cond ( myruleset , true, S, T) . 

20 



21 

clo_rewrite_timel 

imit(S, T, Max) : — 

22 

c all _wi t h _t ime .limit (Max, rwr.cond ( myruleset , true, S, T) ) . 

23 

do_rewrite_timel 

i m i t ( S , S, _). 


Listing 5.5: Compilation of Rewriting Rules and top-level calls 




6. The Symbolic System 


AutoBayes uses its symbolic subsystem extensively. The system is in part imple- 
mented as rewriting rules and in part as Prolog predicates. 

6.1 Top-Level Predicates 

Some of the common top-level predicates are 

simplify (S, T) simplifies expression S and returns T 

simplify (Assumptions , S , T) simplifies expression S and returns T under the given 
assumptions. 

range_abstraction(+S , -Range) provides a range abstraction for S. 

range_abstraction(+Assumptions , +S, -Range) provides a range abstraction for 
S under the given assumptions. 

defined(S, Condition) provides a definedness constraints for S. 

def ined(Assumptions , S, Condition) provides a definedness definition for S un- 
der given constraints. 

solve (Assumptions , Var, Equation, Solution) calls the symbolic equation solver 
to solve the equation Equation for the variable Var under the given assumptions. 

leqs.solve (Assumptions , Vars, Equations, Solution) attempts to solve sym- 
bolically a system of linear equations and returns a solution, using a Gaussian 
elimination. This predicate can use local program variables for sub expression, 
so a let (...) expression is returned. 

Note that for this predicate, the terms must be in list-notation. 

1 ?— simplify ((a+b) *(a— b) ,T) , print.expr (user.output ,0 ,T, _) . 

2 — 1 * b ** 2 + a ** 2 

3T= +[*([ — 1, b**2]) , a**2] 

4 

5?— simplify(sin(x)**2 + cos(x)**2,T). 

e T = 1 . 

?— defined (l/x,D) . 


7 
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9 D = not(0=x) . 

10 

n ?— defined ( tan (x) ,D) . 

12 D = not(0=cos (x) ) . 

13 

i4?— solve(true, x, 5*x**2 —3=0, S) , print _expr ( user .output , 0 , S, _) 

is 1 / 10 * 60 ** (1 / 2) 
is S = *([1/10, 60** (1/2)]) . 

17 

is?— solve(true, x, 17*x —3 = 0, S) , print _expr ( user.output ,0 , S, _). 

19 3 / 17 

20 S = 3/17. 

21 

22 ?- leqs .solve ( [] , [x,y] , [ x , * ( [5 , y ] ) ] ,Y) . 

23 Y = let (local ([]), series([skip, skip, skip, skip, skip, skip, skip], 

[] ) , [y=o, x = 0] ) . 

24 

25 ?- leqs .solve ( [] , [x,y ] , [ x , + ( [5 , y ] ) ] ,Y) . 

26 Y = let (local ([]), series([skip, skip, skip, skip, skip, skip, skip], 

[] ) , [y= -5, x = 0] ) . 

27 

28 ?- leqs .solve ( [] , [x,y ] , [x, + ( [5 ,y ,x] ) ] ,Y) . 

29 Y = let (local ([]), series([skip, skip, skip, skip, skip, skip, skip], 

[] ) , [y= -5, x=o] ) . 

30 

31 ?- leqs .solve ( [] , [ x , y ] , [ + ( [x , 1 ] ) , + ( [5 ,y , x ] ) ] , Y) . 

32 Y = let (local ([]), series([skip, skip, skip, skip, skip, skip, skip], 

[]) , [y= -4, x= -i]) 

Listing 6.1: Examples for symbolic subsystem 


6.2 Program Variables 

The AutoBayes system distinguishes between different kinds of variables. This is 
necessary, because there are Prolog variables, which have to be distinct from code 
variables, which show up in the generated code fragments. The latter type of variable 
is called program variable. 

Program variables are not represented by Prolog variables (because no unification can 
be allowed there), but by a reserved term pv(n), where n is a number. Such program 
variables can be universally quantified or existentially quantified. The latter is used, 
e.g., to convert Prolog variables in a term into actual variable names. 

During pretty-printing or in the final code, existential variables are printed as pv###, 
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e.g., pv96. 

1% GET A NEW FRESH (EXTENSIAL VARIABLE). THE ”PVl” IS THE 

2 % EXTERNAL FORMAT 

3 ?— pv_fresh .existential (X) , print_expr (user .output ,0 ,X, _) . 

4 pvl 

5 X = pvar (1) . 

6 

7 % CONVERT INDEX VARIABLE FOR A SUM INTO PROGRAM VARIABLES 

8 ?— C=sum ( idx (X, 0,10) , d (X) ) , 

9 p v .lift .existential (C) , 

10 print_expr(user_output ,0 , C, _ ) . 

11 sum(pv3 := 0 .. 10, d(pv3)) 

12 C = sum( idx ( pvar (3) , 0 , 10) , d( pvar (3) ) ) , 

13 X = pvar (3) . 

Listing 6.2: Predicates for program variables 




7. Pretty Printing and Text Generation 


7.1 Pretty Printer 

A piece of pseudo-code can be pretty-printed using pp_pseudo(+Stmt) . It pretty- 
prints the statement onto the screen (or into a hie if a stream is given as the first 
argument) . 

An expression can be printed into a stream using print _expr (+Streara, +Indent, 
+Expr, ?NewPos). 


The syntax definition of the intermediate language is given in Appendix [A] 


i ?— pp_pseudo ( assign (x,5*x**3 

— 5 , [comment ( ’ initial ~v alue ’) ]) ) . 

2 // initial value 


3x:=5*x**3 — 5; 


4 true 


5 

6 ?— print _expr ( user_output , 0, 

x**2+cos (x) , _ ) . 

7 x ** 2 + cos (x) 


8 true . 



Listing 7.1: Printing statements and terms 


7.2 Pretty Printer for PTgX and HTML 

Generating an HTML or LH^X representation of an expression or a piece of code, 
the same pretty-printer interface is used. The actual output format is controlled by 
various hags. 

pp_latex_output if set to 1, LT^X output will be generated 
pp Jitml_output if set to 1, HTML output will be generated 
Additional predicates in pp_*.pl provide support to writing headers, etc. 

1 ?— pp_pseudo(assign(x,x + l, [comment ( ’ update u x ’ ) ] ) ) . 

2 // update x 

3 x := x + 1 ; 

4 true 


5 
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6 ?— flag ( pp_latex_output , _,1) > 

7 pp_pseudo ( assign (x ,x + l , [comment ( ’ update u x ’ ) ] ) ) . 

8 

9 / *@\ SETLENGTH { \ MYWIDTH } { 0 PT } \ ADDTOLENGTH { \ MYWIDTH } 

10 { 78\myspace}\ begin { minipage }{\mywidth}\small\vspace* {0.5 ex} 

11 \RM\em\ NOINDENT {} UPDATE X\END{ MINIPAGe}@*/ 

12 x := x + 1; 

13 true . 

14 

is ?— flag ( pp.latex.output , _,0) , 

16 flag ( pp_html_output , _ , 1 ) , 

17 pp_pseudo(assign(x,x + l, [comment ( ’ update „x ’ ) ] ) ) . 
is <font color=” green”> 

19 //&nbsp ; update&nbsp ; x<br></font > 

20 x&nbsp;:=&nbsp ; x&nbsp;+&nbsp ; 1; <br></tt > 

21 </body> 

22 </html> 

23 true 

Listing 7.2: Pretty printing to and HTML 


7.3 Support for Text Generation 

Generation of explanations and comments in the synthesized code is of great im- 
portance. Only a well-documented autogenerated algorithm can be used and under- 
stood. AutoBayes contains a number of predicates to facilitate the generation of 
text fragements to explain schemas and code. These texts are handled as comments 
in the intermediate language and stored as comment (. . .) in the attribute list, e.g., 
assign (x, 0 , [comment ( ’ Initial assignment ’ )] ) . 

The full powered schema-based synthesis approach requires that the explanation text 
can be customized accordingly for scalars, vectors, matrices; single elements and enu- 
meration lists, etc. Predicates in synth/lexicon. pi provide functionality for this 
purpose. 

1 lex.probability .atom (Prob , XP.prob) , 

2 (XP_prob = *(Prob_args) 

3 — > true 

4 ; Prob_args = [ 

5 ) , 

6 lex_numerus_align ( ’The^ ’ , Prob.args , XP .prob.article ) , 

7 lex_numerus_align (’ probability ’ , Prob.args , XP.prob.numerus ) , 

8 lex_numerus_align ( ’ is ’ , Prob_args , XP_prob_verb ) , 

9 lex _numerus .align (’ function ’ , Prob.args , XP_prob_clensity ) , 

10 lex_enumerate_vars (Theta , XP.theta) , 
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11 lex_probability_atom (Prob , XP_prob_atom) , 

12 (Expected = [ 

13 — > XP .likelihood = [ 

14 ’-This-yields-the-log — likelihood - function’, expr(Pre .formula) , 

is ’which, -can-be-simplified .to ’ , expr (Formula) 

!6 ] 

17 ; (maplist (arg(l) , Expected, EVars.list), 

is flatten ( EVars.list , EVars) , 

19 lex.enumerate.vars (EVars , XP.EVars) , 

20 XP .likelihood = [ 

21 ’ -Summing -out -the -expected - ’ , XP .EVars, 

22 ’-yields-the-log — likeli hood-function ’ , expr(Pre.formula) , 

23 ’ which-can-be-simplified -to ’ , expr (Formula) 

24 \ 

25 ) 

26 ) , 

27 XP = [ 

28 'The , ’ , XP.p.type , XP.p, ’- i s ^under - the -dependencies -given -in -the - ’ 

29 ’model-equivalent -to- ’ , expr ( XP.prob.atom ) , 

30 XP .prob .article , XP.prob.numerus , ’ - occuring -here - ’ , XP.prob.verb , 

31 ’.^atomic -and -can -thus „be u replace cl -by -the -respective-probability-’ , 

32 ’density-’, XP.prob.density , ’ -given -in -the -model .’ , XP .likelihood 

33 ’ This - function - is -then -optimized -w. r . t . -the -goal -’ , XP.theta , 


Listing 7.3: Generation of Explanation in a schema synth/synth . pi 



Appendix A. AutoBayes Intermediate Language 


NOTE: The BNF description of the AutoBayes intermediate language is 
not up-to-date 

AutoBayes uses a simple procedural intermediate language when it synthesizes code. 
This language is kept through all stages (synt, iopt, lang), until at the final stage, code 
in the target language’s syntax is produced. 

The intermediate code for AutoBayes is a relatively generic (procedural) pseudo 
code which contains specific means for handling numeric data and data structures 
like vectors and arrays. Syntactically, a program in that pseudo-code is a term as 
defined below. 

For extended purposes, ATTR is introduced for most language constructions. They will 
contain attributes (e.g., state of initialization of the variable) or annotations which 
could contain explanations. ATTR is a list of well-formed (opaque) terms or the empty 

list [] . 

A.l Code 

This top-level functor splits the program into a declarations and statements parts 

PSEUD0_PR0GRAM : : = 

prog ( IDENT, DECLS , STMT, ATTR ) 

Changes: code now contains a full list of declarations. IDENT will be the name of the 
function / program. 

A. 2 Declarations 

All identifiers used within the code must have appropriate declarations; the only 
exceptions are index variables occurring within sums, loops, etc., as such constructs 
can easily be transformed into individual blocks containing the local declarations at 
the beginning of the construct. [|] 


1 Note that this requires different names for loop variables which occurr in nested loops. 
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Constants and variables are declared in a declaration block at the beginning of the 
program. The declaration block distinguishes between constant values, input, which 
are the parameters given to the synthesized routine, output which are the results 
returned by the synthesized routine, and local variables. 

Symbolic model constants as for example the dimensions of vectors are represented 
either as constants if their value is given by the model or can be derived from other 
given constants or input variables or as input variables if their value must be supplied 
at runtime. 

DECLS : : = 
decls ( 

constant ( [ DECL_LIST ] ), 
input ( [ DECL_LIST ] ), 
output ( [ DECL_LIST ] ), 
local ( [ DECL_LIST ] ) 

) 


DECL_LIST ::= 

DECL 

I DECL , DECL_LIST 

DECL : : = 

SCALAR_DECL 
I VECTOR_DECL 
I MATRIX_DECL 
I ARRAY_DECL 


SCALAR_DECL : : = 

scalar ( IDENT, TYPE.IDENT, ATTR ) 

VECTOR_DECL : : = 

vector( IDENT, TYPE_ IDENT, [ DIM_LIST ], ATTR ) 
MATRIX_DECL : : = 

matrix ( IDENT, TYPE_ IDENT, [ DIM_LIST ], ATTR ) 
ARRAY_DECL : : = 

array ( IDENT, TYPE_IDENT, [ DIM_LIST ], ATTR ) 



A. 3 Indices and dimensions for vectors, arrays, and matrices 
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TYPE_IDENT ::= 
double 
I float 
I int 
I bool 

Changes: declarations for vectors are similar to the old format, but now also contain 
the lower bounds. Note: giving the name with a set of FVARS only introduces 
IDENT/n not IDENT/0 and IDENT/n 

variables marked const never occur on the left hand side of an assignment. 

A. 3 Indices and dimensions for vectors, arrays, and matrices 

All indices into vectors or arrays (e.g., for declaration, iterative constructs) are given 
as lists of triples with the functor idx. For specification of vector /matrix/array di- 
mensionality, the construct dim(El,E2) is used, where the constant expressions El 
and E2 define the lower and upper bound of one dimension of the data object. 

IDX.LIST : := 

IDX 

I IDX , IDX_LIST 


IDX : : = 

idx( IDENT , EXPR , EXPR ) 

DIM_LIST : := 

DIM 

I DIM , DIM_LIST 


DIM : : = 

dim( EXPR , EXPR ) 

The IDENT is the loop variable, the EXPRs are the lower bound and upper bound 
respectively. 
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A. 4 Attributes 

Attributes are opaque lists of terms used for various purposes, like attachments of 
comments or explanations or parameters (like target system, optimization level). 

ATTR : : = 

[] 

I [ LIST_OF_ATTR ] 

LIST_OF_ATTR ::= 

AT 

I AT , LIST_OF_ATTR 

Example attributes which are currently being used are: 

AT : : = 

file(IDENT) 

I target_language (LANGUAGE) 

I indent (NUMBER) 

I verbosity (NU) 

I linewidth(NU) 

I pedantic 
I target (TARGET) 

I comment (COMMENT) 

I initialize (EXPR) 


LANGUAGE : : = 

c | cplusplus 
TARGET : : = 

mat lab | octave 


file The code-generation module will output the resulting code into the hie file. 
This attribute is only evaluated on the top-level attribute- list of the prog. 

target_language Select a target language for the code to be generated (overridden by 
selection of the target system). This attribute is only evaluated on the top-level 
attribute- list of the prog. 

indent indentation level for formatting (default: 2). This attribute is only evaluated 
on the top-level attribute-list of the prog. 
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linewidth maximal length of a line in produced output code (default: 80). This 
attribute is only evaluated on the top-level attribute-list of the prog. 

verbosity This is the verbosity level of the code-generation subsystem. 

initialize This attribute is used for the declaration part only. A skalar variable 
is being initialized to the value given by EXPR. EXPR must be a simple expres- 
sion (i.e., must not contain any pseudo-code instructions which evaluate into 
statements (like sum, norm,. . . ). 

comment Comments can be an atom or a list of atoms. Long lines are broken up into 
several shorter lines. Comments can have the following control atoms (must be 
present as single atoms): 

\ n forces an immediate line-break 

labelref (label) prints a reference to the label label defined elsewhere. 

label (label) defines a label for later reference. In the current version, a label is 
printed as an additional comment. 

In the current version, only the following attributes are evaluated for each statement: 

comment, label. 

A. 5 Statements STMT 

STMT : : = 

SERIES 
I BLOCK 
I F0R_L00P 
I IFSTAT 
I ASSIGN 
I WHILE 
I ASSERT 
I CALL_STAT 
I CONVERGING 
I ANNOTATION 
I FAIL 
I SKIP 

STMT_LIST ::= 

STMT 

I STMT , STMT_LIST 
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A. 5.1 fail and skip 

fail generates a run-time error and/or exception and aborts processing of that func- 
tion. skip just does nothing. 

FAIL : := 

fail (ATTR) 


SKIP : := 

skip (ATTR) 
I skip 


A. 5. 2 Sequential Composition 
SERIES : := 

series ( [ STMT.LIST ] , ATTR ) 

BLOCK : : = 

block ( local ([ DECL_LIST ]) , STMT , ATTR ) 


A. 5. 3 Annotations 

ANNOTATION ::= 

annotation ( TERM ) 

Annotations are placed “as is” into the code. 

A. 5. 4 For-Loops 
FOR.LOOP : := 

f or ( [ IDX_LIST ] , STAT, ATTR ) 


A. 5. 5 If-then-else 
IFSTAT : : = 

if ( EXPR , STAT , STAT , ATTR ) 


A. 5. 6 While-Converging 
CONVERGING : : = 

while_converging ( [ VECTORLIST ] , EXPR, STAT , ATTR) 
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Change: The EXPR evaluates to the tolerance down to which the iteration is to be 
performed. 

VECTORLIST ::= 

VECTORDECL 

I VECTORDECL , VECTORLIST 

A. 5. 7 While and Repeat Loop 
WHILE : := 

while ( EXPR , STAT, ATTR ) 

REPEAT : : = 

repeat ( EXPR , STAT, ATTR ) 

A. 5. 8 Assertion 
ASSERT : : = 

assert ( EXPR, TERM , ATTR) 

Changes: This assert is to be used instead of the construct if (expr,stat, fail) 
The TERM is opaque and will be used in conjunction with explanation techniques. 

A. 5. 9 Assignment Statement 

ASSIGN : := 

SIMPLE_ASSIGN 
I MULTIPLE_ASSIGN 
I SIMUL.ASSIGN 
I COMPOUND_ASSIGN 

SIMPLE_ASSIGN ::= 

assign( LVALUE , EXPR , ATTR) 

MULTIPLE_ASSIGN ::= 

assign_multiple( LVALUE.LIST, EXPR, ATTR ) 

SIMUL_ASSIGN ::= 

assign_simul ( LVALUE_LIST, EXPR, ATTR ) 


COMPOUND_ASSIGN 
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assign_compound( [IDX_LIST] , LVALUE, EXPR, ATTR ) 


Note: the compound assignment will not be available in the current version. 

LVALUE : : = 

VAR 

I VAR ( EXPR_LIST ) 

A value gets assigned to a skalar variable or an array access. 

A. 5. 10 Misc. Statements 

SOLVER_STAT : : = 

unsolved (LABEL , STAT ) 

I poly_solver ( . . . ) 


A. 6 Expression EXPR 

EXP_LIST : := 

EXPR 

I EXPR , EXPR_LIST 


EXPR : : = 

NUMERICLCONSTANT 
I CONSTANT 
I VAR 

I VAR ( EXPR_LIST ) 

I - EXPR 
I PRE_0P 
I EXPR OP EXPR 
I SUM.EXPR 
I NORM. EXPR 
I MAXARG_EXPR 
I ( EXPR ) 

I NUMFUNC 
I BOOLFUNC 
I CONDEXPR 

I attr( EXPR , ATTR ) 
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The attr can be used to give attributes to atomic expressions and/or expressions 
without a leading function symbol. 


NUMERIC_CONSTANT : : = 
0 111 
I FLOAT 

I Pi 

CONSTANT : := 
identifier 


VAR : : = 

identifier 

OP : : = 

+ | - | * | ** | / 

PRE_0P : : = 

sdiv(EXPR, EXPR) 

I ssqrt(EXPR) 

I slog(EXPR) 

The operators sdiv, ssqrt, slog are safe extensions of the usual operators. The 
code-generator will generate a check for validity and the desired operation, using a 
newly introduced variable to avoid multiple copies of the expressions. 

Note: the usual infix-operators with the usual operator precedence as well as prefix 
notation (e.g., ’+’(X,Y)) can be used. 

A. 6.1 Boolean Expressions 

B00LFUNC : : = 

nonzero ( EXPR ) 

I true 
I false 

I EXPR RELOP EXPR 


RELOP : : = 

<=<>>=== ! = 


Note: the < is =< to conform to PROLOG standard. 
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Note: The operation nonzero has been introduced for handling numerical instability. 
Whereas EXPR != 0 really checks for being equal to 0, nonzero (EXPR) just checks if 
the absolute value of EXPR is larger than some given e. 

A. 6. 2 Numeric expressions and functions 

NUMFUNC : : = 

sqrt ( EXPR ) 

I exp ( EXPR ) 

I sin ( EXPR ) 

I abs ( EXPR ) 

I random 

I random_int (EXPR, EXPR) 


The function random returns a pseudo-random number between 0 and 1; random_int 
returns a pseudo-random integer in the given range. 

A. 6. 3 Summation expression 
SUM_EXPR : : = 

sum( [ IDX_LIST] , STAT , ATTR) 

A. 6. 4 Indexed Expressions 
IDX_EXPR : : = 

select ( IDENT, [ IDX_LIST] ) 

A. 6. 5 Getting the Norm of an iteration 
N0RM_EXPR : : = 

norm( EXPR, [ IDX_LIST] , EXPR , ATTR) 

The intended meaning of this construct is to get the value of EXPR1 nornred to 
EXPR2. For example, 

norm(v(i) , [idx( j ,1,N)] ,v(j), []) calculates: v{i)/ Y^=\ v (j)- 
The expression norm (EXPR, [IDX_LIST] ,EXPR2) unfolds into 
EXPR1 / sum( [IDX_LIST] ,EXPR2) 
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Since usually (or actually the only thing which makes sense) the sum-expression is 
constant wrt. the EXPR1 (in our case the i), this sum could be moved out of the 
for-loop. 

However, beware of the situation where you have: 

for ( [idx(i,0,N)j , 

v(i) = norra(v(i) , [idx(j ,0,N)] ,v(j)] ) 

This would NOT correctly normalize that vector (because you modify the v(i) and 
with that the sum. So care must be taken to take the correct thing. 

A. 6. 6 Maxarg 

MAXARG_EXPR : : = 

maxarg ( [ IDX_LIST] , EXPR , ATTR) 

determine index where EXPR gets its maximal value. 

A. 6. 7 conditional expressions 
CDNDEXPR : : = 

I cond ( EXPR , EXPR , EXPR ) 
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A list of all pragmas formatted in DTfrXcan be generated by autobayes -tex -help 
pragmas. This is a subset. 

cg_comment_style (atomic) select comment style for C/C++ code generator 
Default: -pragma cg_comment_style=cpp 
Possible values : 

kr use traditional (KR) style comments 
cpp use C++ style comments / / 

cluster.pref (atomic) select algorithm schemas for hidden- variable (clustering) prob- 
lems 

Default: -pragma cluster.pref =em 

Possible values : 
em prefer EM algorithm 
no.pref no preference 
kuneans use k-means algorithm 

codegen_ignore_inconsistent_term (boolean) [DEBUG] ignore inconsistent-term 
conditional expressions in codegen 

Default: -pragma codegen_ignore_inconsistent_term=f alse 
em (atomic) preference for initialization algorithm for EM 
Default: -pragma em=no_pref 
Possible values : 
no.pref no preference 
center center initialization 
sharp_class class-based initialization (sharp) 
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fuzzy_class class-based initialization (fuzzy) 
em_log_likelihood_convergence (boolean) converge on log-likelihood-function 
Default: -pragma em_log_likelihood_convergence=f alse 
em_q_output (boolean) Output the Q matrix of the EM algorithm 
Default: -pragma em_q_output=f alse 

em q update simple (boolean) force the q-update to just contain the density func- 
tion 

Default: -pragma em_q_update_simple=f alse 

ignore_division_by_zero (boolean) DEBUG: Do not check for X=0 in X**(-l) 
expressions 

Default: -pragma ignore_division_by_zero=f alse 

ignore_zero_base (boolean) DEBUG: Do not check for zero-base in X**Y expres- 
sions 

Default: -pragma ignore_zero_base=f alse 

inf ile_cpp_pref ix (atomic) Prefix for intermediate input hie after cpp(l) process- 
ing 

Default: -pragma inf ile_cpp_pref ix=cpp_ 

instrument_convergence_save_ub (integer) default size of instrumentation vector 
for convergence loops 

Default: -pragma instrument_convergence_save_ub=999 
lopt (boolean) Turn on/off optimization of the lang code 
Default: -pragma lopt=false 

optimize_cse (boolean) enable common subexpression elimination 
Default: -pragma optimize_cse=true 

optimize_expression_inlining (boolean) enable inlining (instead function calls) 

of goal expressions by schemas 

Default: -pragma optimize_expression_inlining=true 

optimize_max_unrolling_depth (int) maximal depth of for- loops w/ constant bound 
to be unrolled 
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Default: -pragma optimize_max_unrolling_depth=3 

optimizeunemoization (boolean) enable subexpression-memoization 

Default: -pragma optimize_memoization=true 

optimize_substitute_constants (boolean) allow values of constants to be substi- 
tuted into loop bounds 

Default: -pragma optimize_substitute_constants=true 
rwr_cache_max (integer) size of rewrite cache 
Default: -pragma rwr_cache_max=2048 

schema_control_arbitrary_init_values (boolean) enable initialization of goal vari- 
ables w/ arbitrary start /step values 

Default: -pragma schema_control_arbitrary_init_values=f alse 
schema_control_init_values (atomic) initialization of goal variables 
Default: -pragma schema_control_init_values=automatic 

Possible values : 

automatic calculate best values 
arbitrary use arbitrary values 

user user provides values (additional input parameters 

schema_control_solve_partial (boolean) enable partial symbolic solutions 

Default: -pragma schema_control_solve_partial=true 

schema_control_use_generic_optimize (boolean) enable intermediate code gener- 
ation w/ generic optimize(...)-statements 

Default: -pragma schema_control_use_generic_optimize=f alse 

synth_serialize_maxvars (integer) maximal number of solved variables eliminated 
by serialize 

Default: -pragma synth_serializeunaxvars=0 
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C.l Simple AutoBayes Problem 

C.l.l Specification 


Throughout the text, the following simple AutoBayes specification is used (List- 
Section C.1.2 shows the autogenerated derivation for this problem. The 


C.l). 


mg 

entire BTpXdocument has been generated except for the red lines. 


Notes: 


• the original specification uses mu, sigma, etc. The DT^X output automatically 
converts most greek names into greek symbols; variable names ending in _sq are 
converted into squares. 

• Upper case symbols can be used in specifications, when the flag prolog-Style 
is set to false. 


• BTf^X output is produced using the -tex synt command-line option. 

• The current version type-sets the entire program (code and comments); for the 
derivation below, only the comments were extracted (manually). 

1 model normal-simple as ’Normal model without PRIORS ’ . 

2 

3 double mu. 

4 double sigma_sq as ’SIGMA SQUARED’. 

5 where 0 < sigma.sq . 

6 

7 const nat n as ’A DATA points ’ . 

8 where 0 < n . 

9 

10 data double x( 0 ..n — 1 ) as ’known data points’. 

11 x( _) ~ gauss(mu, sqrt ( sigma.sq ) ) . 

12 

13 max pr(x | {mu, sigma_sq}) for {mu, sigma_sq}. 


Listing C.l: Simple AutoBayes specification 
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C.1.2 Autogenerated Derivation 

begin autogenerated max pr(x|mu,s) for mu,s 

The conditional probability P(x \ /i, cr 2 ) is under the dependencies given in the model 
equivalent to 


— 1 +n 

IJ P(x*|/i,a 2 ) 

i = 0 


schema: prob-2-formnla 

The probability occurring here is atomic and can thus be replaced by the respective 
probability density function given in the model. This yields the log-likelihood function 
PDF (gauss) = 1/..* exp(...) 


— l+n 


log n exp 




i = 0 


V 


(cr 


2^2 


7 


V2tt (cr 2 ) 


which can be simplified to 


1 1 1 2 
- - n log 2 + - - n log 7 r + - - n log cr + 


1 


— l+n 


y7 (~ 1 / i + 


Xi 


i = 0 


This function is then optimized w.r.t. the goal variables // and cr 2 . 
solves the maximation task 


The summands 


-- n log 2 
-7 n log 7T 


optimization 


are constant with respect to the goal variables /r and cr 2 and can thus be ignored for 
maximization. 

The factor 

1 

2 
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is non-negative and constant with respect to the goal variables fi and a 2 and can thus 
be ignored for maximization. 

The function 


— l+n 

-1 n logo- 2 + -1 (a 2 ) 1 ^ (-1/i + Xj ) 2 

i = o 

is then symbolically maximized w.r.t. the goal variables /i and a 2 . The partial differ- 
entials text-book like: set first derivative = 0 and solve 


— 1 +n 

= —2/j,n(a 2 ) 1 + 2 (cr 2 ) 1 %i 

i = o 

— l+n 

= -1 n (a 2 )" 1 + (o- 2 )~ 2 y^(-l/i + ^) 2 

i=0 

are set to zero; these equations yield the solutions 

solver can symbolically solve 


df_ 

<9/i 

df_ 
da 2 


— 1 +n 

H = n _1 Xi 

i = 0 

— l+n 

cr 2 = n _1 (-1/X + ay) 2 

i=0 


end autogenerated document 
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D.l Running AutoBayes 

D.1.1 Exercise 1 

Run the norm.ab example and inspect generated code and derivation. If possible, 
generate the latex version of the derivation. 

1 model normal_simple as ’Normal model without priors ’ . 

2 

3 double mu. 

4 double sigma.sq as ’SIGMA SQUARED’. 

5 where 0 < sigma.sq . 

6 

7 const nat n as ’# DATA points ’ . 

8 where 0 < n . 

9 

10 data double x(0..n — 1 ) as ’KNOWN data points’. 

11 x(_) ~ gauss(mu, sqrt ( sigma.sq ) ) . 

12 

13 max pr(x | {mu, sigma_sq}) for {mu, sigma_sq}. 

Listing D.l: norm.ab 


D.l. 2 Exercise 2 

Generate multiple versions for this problem. Note: use the appropriate flags to allow 
AutoBayes to generate numerical optimization algorithms: 
(schema_control_arbitrary_init .values) 

D.l. 3 Exercise 3 

Modify the “norm” example to use a different probability density function. Note that 
some of them do have a different number of parameters. Inspect the generated code 
and derivation. Can the problem be solved symbolically for all PDFs? 

Hint: use vonmisesl, poisson, weibull, cauchy 
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D.l. 4 Exercise 4 

Generate multiple versions for the mixture-of-gaussians example. What are the major 
differences between the different synthesized programs. 

Note: the specification is mog.ab in the models manual directory. 

Generate a sampling data generator (autobayes -sample) for this specification. 

In AutoBayes generate 1000 data points that go into 3 different classes. Then run the 
different programs and see how good they estimate the parameters. 

Note: the generated functions require column- vectors, so, e.g., give the means as 

[1,2,3] ’ 

1 octave —3. 4.0:1 > sample_mog 

2 usage: [vector c , vector x] = sample_mog ( vector mu, int n_points , vector 

phi , vector sigma) 

3 

4 octave —3. 4.0:2 > [c,x] = sample.mog 

([1 ,2 ,4] ’ ,1000 ,[0.3 ,0.1 ,0.6] ’ ,[0.1 ,0.1 ,0.2] ’ ) ; 

Listing D.2: calling the synthesized code in Octave 


D.l. 5 Exercise 5 

Run a change-point detection model (e.g., climb ^transition. ab and look at gener- 
ated code and derivation. How does AutoBayes find the maximum? 

D.1.6 Exercise 6 

Add the Pareto distribution to the built-in transitions. Get the formulas from wikipedia. 

Try the following simple model: 

1 model pareto as ’Normal model without priors ’ . 

2 

3 double alpha. 

4 where 3 < alpha. 

5 const double xm. 

6 where 0 < xm. 

7 

8 const nat n as ’# DATA points ’ . 

9 where 0 < n . 

10 

n data double x(0..n — 1) as ’KNOWN data points’. 

12 where 0 < x( _ ) . 
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13 where xm < x ( _ ) . 

14 

15 x ( _ ) ~ pareto (xm, alpha). 

16 

17 max pr(x | {alpha}) for {alpha}. 

Listing D.3: Specification for Pareto distribution 


1 octave —3. 4.0:2 > xm=5; 

2 octave —3. 4.0:3 > alpha = 15; 

3 octave —3. 4.0:4 > x=xm* (1 ./ (1 — rand ( 1 0000 , 1 )) P ( 1 / alpha )) ; 

4 octave —3. 4.0:5 > alpha.est = pareto(x,5) 

5 alpha_est = 15.081 

Listing D.4: Generate Pareto-distributed random numbers 




Appendix E. Research Challenges and Programming 
Tasks 


E.l PDFs 

E.l.l Integrate X 2 PDF into AutoBayes 

The x 2 PDF is important to handle square errors of Gaussian distributed data. E.g., 
for X, Y ~ N(fi, a 2 ) we get X 2 + Y 2 ~ y 2 (l). 

E.l. 2 Integrate folded Gaussian PDF into AutoBayes 

Folded Gaussian PDF is important to handle problems with abs functions. For X ~ 
N( 0,1), we get \X\ ~ N f (0). 

E.1.3 Integrate Tabular PDF into AutoBayes 

Handling of non-functional PDFs, e.g., for ground-cover clustering. The PDF is given 
as a vector over the data X, e.g., as X ~ tab(p ) where constdoublep(0..n — 1). and 
where 0 = sum(I := 0 ,:n — 1 ,p{I)) — 1 

Normalization is important 

E.2 Gaussian with full covariance 

Currently, AutoBayes can only handle Gaussian distribution with a diagonal co- 
variance matrix, i.e., E y = 0 for i ^ j. 

This could be implemented as a separate PDF, or the dimensionality could be inferred 
from the declaration of the sigmas. 

Requires 3-dim arrays for multivariate clustering. 
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E.3 Preprocessing 

E.3.1 Normalization of Data 

Develop a schema for the normalization of data toward 0..1 or N( 0, 1). For Gaussian 
PDF, aX + b ~ N(a/i + b, a 2 a 2 ) 

E.3. 2 PCA for multivariate data 

This preprocessing cuts down the number of dimensions given by a given goal (thresh- 
old on the eigen values or reduction of dimensions). The rnd-projection schema could 
be used for this. 

Note that after clustering, the resulting parameters must be mapped back to the 
original space. 

E.4 Clustering 

E.4.1 KD-tree Schema 

must be dug up and integrated (Alex Grey(?)) 

E.4. 2 EM schema with empty classes 

The current EM algorithm fails if one or more classes become empty. The EM schema 
must be extended to enable handling this case. Since we cannot dynamically resize the 
data structures, an index vector (e.g., valid-classes) must be carried along. Refactoring 
of the EM schema might be a good idea 

E.4. 3 Clustering with unknown number of classes 

With a very simple approach, a schema is developed, which executes a for loop over 
the number of classes and returns the parameters for the run with the maximum 
likelihood. 

The spec gives the range of class numbers. 

Extensions: run the algorithm multiple times (with different random initializations) 
for each n.classes. 

Should be combined with different quality-of-clustering metrics 
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E.4.4 Quality-of-clustering metrics 

Currently, AutoBayes stops clustering, when a given tolerance is reached. Then it 
returns the log- likelihood as the only quality metric. 

The literature describes a large number of different quality metrics for clustering. 
Develop schemas for calculating one or more of these metrics after each run of a 
clustering algorithm 

E.4.5 Regression Models 

E.5 Specification Language 

E.5.1 Improved Error Handling 

E.6 Code Generation 

E.6.1 R Backend 

R is a popular language for statistics purposes. Thus an interface of AutoBayes to 
R is important to increase the usability of AutoBayes. 

E.6. 2 Arrays in Matlab 

Currently all matrices and arrays are linearized and the access is done using a macro. 
I.e., Xij is implemented as *(x+i*N +j). 

The access with using a vectorized linearization is to be implemented. 

E.6. 3 Java Backend 

Develop a backend for stand-alone Java. This requires definition of a suitable data 
structure for arrays (and their allocation), handling of multiple arguments and return 
values) . 

E.6. 4 C stand-alone 

The C stand-alone code generator must be debugged and improved 
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E.6.5 Code Generator Extensions for functions/procedures 

E.7 Numerical Optimization 

E.7.1 GPL library 

E.7. 2 Multivariate Optimization 

Add schema-library for multivariate optimization 

E.7. 3 Optimizations under Constraints 

trust region algorithms 

E.8 Symbolic 

E.8.1 Handling of Constraints 

E.9 Internal Clean-ups 

E.9.1 Code generation 

cg_compoundexpr.pl 

E.10 Major Debugging 

E.10.1 Fix all Kalman-oriented examples 

E.ll Schema Control Language 

E.12 Schema Surface Language 

E.12.1 Domain-specific surface language for schemas 

E.12. 2 Visualization of Schema-hierarchy 

E.12.3 Schema debugging and Development Environment 

E.13 AutoBayes QA 

E.14 AutoBayes/AutoFilter 

Implementation of Particle Filters for Health Management 



